New probabilistic interest measures for association rules

نویسندگان

  • Michael Hahsler
  • Kurt Hornik
چکیده

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. We start this paper with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implications of Probabilistic Data Modeling for Mining Association Rules

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine association rules are discussed in great detail. We present a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a re...

متن کامل

A New Probabilistic Measure of Interestingness for Association Rules, Based on the Likelihood of the Link

The interestingness measures for pattern associations proposed in the data mining literature depend only on the observation of relative frequencies obtained from 2×2 contingency tables. They can be called “absolute measures”. The underlying scale of such a measure makes statistical decisions difficult. In this paper we present the foundations and the construction of a probabilistic interestingn...

متن کامل

ION: a pertinent new measure for mining information from many types of data

Since last decade, many methods with appropriate measures are proposed in knowledge discovery in databases. These measures aim at both improving the quality of mined association rules and reducing the problem of many nested rules. This paper presents a new statistical Implication Oriented Normalized measure, denoted ION. ION turns to be a unifying framework for several probabilistic measures of...

متن کامل

A Study on Post mining of Association Rules Targeting User Interest

Association Rule Mining means discovering interesting patterns with in large databases. Association rules are used in many application areas such as market base analysis, web log analysis, protein substructures. Several post processing methods were developed to reduce the number of rules using nonredundant rules or pruning techniques such as pruning, summarizing, grouping or visualization based...

متن کامل

A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining

Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2007